The CICA Windows Explosion!

home *** CD-ROM | disk | FTP | other *** search

/ The CICA Windows Explosion! / The CICA Windows Explosion! - Disc 2.iso / programr / rtx2000.zip / rtx2000 / fc / fc.doc next >

Wrap

Text File | 1993-06-08 | 45KB | 1,531 lines

fc - A Public Domain FORTH Cross Compiler for the RTX2000 Microcontroller Version 1.14 fc is a FORTH compiler that generates machine code for the Harris RTX2000 microcontroller chip. fc accepts text files generated by a text editor and produces optimized object code files for the RTX2000. fc has options for generating annotated object code listings and cross reference files which can be incorporated into other programs to link variable and word definitions. fc can also be used to produce code for the Harris RTX2010 and RTX2010 specific instructions can be added using the #macro and ucode features. fc currently runs on an AMIGA or IBM PC compatible computer with either a hard disk or ram disk. fc has been tested using AmigaDos 1.3 and MS-DOS 5.0 but, may work with other versions of either operating system. fc was made possible by the Public Domain version of Berkely yacc available on Fred Fish Disk #419. Many thanks to Bob Corbett and those who helped make this public domain version of yacc. I would also like to thank John Goldsten and Augie Mattheiss for their help in debugging and testing the compiler. John, Augie, and the author have developed several programs using the MS-DOS and AMIGA versions of the compiler that have been successfully executed on RTX2000 target systems. We normally use the -o output option or the - e output option for EPROMs. The EPROM output option has been successfully used with a DATAIO UNISITE programmer. fc is a Public Domain program so no fee should be charged for distribution except for a possible media charge. fc was developed using Aztec C for AMIGA version 3.6A with the default 16 bit integers. fc has been ported to MS-DOS and the MS-DOS executable is included in the distribution. That version is identical to the AMIGA version except for some details pertaining to memory allocation and the naming of some of the source file names. An experimental Macintosh version may also be included in the distribution and it will require MPW. fc is currently being used for software development future revisions are likely. We already have had several months experience producing and executing code generated by fc. However, fc is offered "as is" and will not necessarily be supported. Anyone wishing to develop large programs for the RTX2000 or RTX2010 may wish to look into other development systems or cross compilers from one of the several vendors marketing them. A test file is included with the distribution to demonstrate code production of the compiler. This output has been checked against the instruction set listed in the Harris RTX2000 Programmer's Reference Manual. However, users should check the code production in a disassembly output should they run into any really puzzling problems when debugging their programs. We have developed several programs up to 4K bytes long and have not run into any code production problems for the last couple of months. Background The RTX2000 microcontroller is a 16 bit microprocessor architectureally designed for executing the FORTH language. The RTX2000 instruction set corresponds to FORTH primitive words (like SWAP, DROP, @, !, etc) and combinations of primitive words which, in some cases, allow several FORTH instructions to be encoded into a single RTX2000 instruction. All RTX2000 instructions execute in one or two cycles and are sixteen or thirty-two bits long. It is common to run the RTX2000 at rates of 8MHz to 10MHz making it fast for real time control applications. The RTX2000 chip includes two onboard stacks (256 deep), one is used for parameters and the other is used mainly for subroutine returns. This allows stack operations like SWAP, DROP, etc, to be performed without an external memory access. Likewise subroutine returns are done using the onboard return stack making subroutine overhead quite low. In fact, most RTX2000 instructions can be coded to perform a return as part of the instruction so, returns often don't even require any extra code. The RTX2000 has several onboard peripherals, including three timer/counters and an interrupt controller. For more information on the RTX2000, consult the data sheet from Harris semiconductor. Potential users are strongly advised to investigate Harris' plan for future support of the RTX2000 family before committing to the chip. fc should also be compatible with the Harris RTX2010RH which is sold as an ASIC. The RTX2010RH features higher radiation tolerance then the RTX2000, a barrel shifter, and a multiply accumulate circuit. Support for RTX2010 specific instructions can be added by using the macro capability of fc. Running fc fc is invoked from the CLI (AMIGA) or MS-DOS by typing: fc {-lreoxsd} {iaaa} {-tbbb} filename fc recognizes nine command line options - l, x, e, o, r, s, d, i and t. The l, x, s, d, i, and t options can be used in any combination along with one of the other three options. All options must proceed the source file name. The source file name is used as the base name for generating output files. If the source file has an extension, the extension is removed when forming the output file names. The l option writes a disassembled listing of the program into a file named filename.lst. The e option produces two EPROM output files named filename.low and filename.hi instead of the default object code output. The r option also generates EPROM files, but the data is in a nonstandard bit shuffled format. The o option produces an ascii/hex output file named filename.ols. The x option generates a definition file named filename.x which can be used by other programs to access words and variables in the compiled program. The s option prints a symbol table into filename.sym. The d option turns on the conditional compilation. The i option specifies a directory for include files. Finally, the t option specifies a directory for temporary files. The l option generates an object code listing along with the corresponding FORTH source code. This allows the programmer to check how well the code was optimized. Often a fc listing file will include a single line with thirty-two bits of code. This is done for long literal instructions (which are 32 bits long) and when optimizations bail-out due to a missing key instruction (often an alu operation, fetch, or store). The listing file also includes any error messages. The e option generates object code for hi and low byte EPROMs. The output format is ASCII/HEX compatible with DATAIO format 51 (or 56) and is produced only if no errors occurred during compilation. This format consists of a start code, one or more address declarations and data blocks, followed by a stop code and checksum. After compilation, fc will query the user for the EPROM base address. This address is subtracted from the code address to produce an EPROM relative address for output files. (If EPROMs are at 0x8000 hex, and code starts at 0x8a00, the EPROM files would have address 0xa00 hex) The r option generates the same type of files except the data for each byte is bit reversed. The o option generates object code in an ascii/hex format that is compatible with the load block format described later. Each 16 bit entity in the output file is represented by a four digit hexadecimal number. The format consists of a configuration number (equal to the compiler version), start address, data word count, control checksum, object code, and block checksum. As with the other output formats, code is only produced if there are no compilation errors. The x option produces a file with information about the symbols used in the program. This information includes the address of all defined words and variables that are preceeded with the "xlink" label. The "xcode" and "xheap" labels will cause the next available address for code and/or variable allocation to be written to the file. The information is provided in a form that is compatible with fc. That is, another fc program can use a #include directive and include this file and reference all the words and variables qualified with "xlink" in the original program. If the xcode and xheap qualifiers are used in the original program, any new variables and code produced will fit continuously with those produced by the first program. This allows programs to be compiled a piece at a time. For example, some code could be developed for EPROM and some for RAM. The code developed for RAM could use this feature to call routines in EPROM. The s option produces a file containing the symbol table. Each symbol is listed along with its hexadecimal value. A '*' is used to indicate words and variables that are qualified with the "xlink" label. The d option compiles all lines in the input that start with a '#' followed by a space. Normally these lines are ignored if the d option is not specified. This allows debug code to be inserted into a program conditionally depending on if the d option is specified at compile time. The i option specifies a directory to be searched for include files. The directory name should immediately follow the i without any spaces. The t option specifies a directory for temporary files. The directory name should immediately follow the t without any spaces. Environment fc supports two environment variables, FC_TEMP and FC_INCLUDE. These environment variables define the path names for temporary fils and include files respectively. Unfortunately with the AMIGA version, the variables must be set using the Aztec "set" command supplied with the compiler instead of the environment commands provided with AmigaDos 1.3. Setting the environment variables is not required but, can make development more convenient and faster. The FC_TEMP variable should be set to a directory on either the hard disk or ram disk. If this variable is not set, temporary files will be written to the current working directory. The FC_INCLUDE directory can be set to a directory containing a collection of include files. This directory will be searched for an include file if it can not be found in the current working directory. If an include search directory is specified with the i option, it will override the directory specified by the FC_INCLUDE environment variable. Also if a temporary file directory is specified with the t option, it will override the directory specified by the FC_TEMP environment variable. fc Preprocessor When fc is invoked, it processes the input file through a preprocessor that expands all include files and macros and produces a temporary file used for input to the compiler. The preprocessor also supports the conditional compilation and removes comments and inserts filename/line number stamps used as reference for producing error messages. The preprocessor reserves the '#' character for specifying commands and conditional compilation. The preprocessor has two commands -- one for including files and another for macro definitions. Either command must start in column one and the keyword (include or macro) must be in lower case letters. The entire command must fit on a single line. The include file inserts the contents of another file into the program. The filename follows "#include " and must not have any spaces and may be qualified to access files outside the current directory. The current directory is searched for the file. If it is not found, the directory specified by the i option or the FC_INCLUDE environment variable is searched. The macro command takes the first text following #macro as the macro name and the remainder of the line as the replacement text. From then on, whenever the macro name is encountered, the replacement text is substituted. Macro substitutions are case sensitive. Text enclosed in double quotes and macro replacement text is not is not searched for macros. Examples of the include and macro commands: #include filename #macro name replacement text #include monitor.x #macro nip swap drop Comments start with a '(' character and end with the corresponding ')'. Any number of matched pairs of '(' and ')' may appear within a comment. Comments may not begin or end on a line with a #macro command. Macros and include files are used to extend the capabilities of the fc compiler. For example, fc does not directly recognize the RTX2000 register instructions such as pc@, mr!, r>, yet these instruction are easily added through the use of macros. Code libraries and files filled with standard macros can be developed and accessed easily with the include command. The i option or the FC_INCLUDE environment variable allows a single directory to be set up as a repository of macro definitions and code libraries. In addition to supporting include and macro, the '#' character is also used to support conditional compilation. A line in a source file can be made conditional by inserting a '#' as the first character of the line. At least one space or tab should follow the '#' before any of the source code. If the program is compiled with the d option, the conditional lines are included in the program. If it is compiled without the d option, the conditional lines are not included in the program. fc Memory Allocation The RTX2000 uses absolute addressing for branches and variable reference and the compiler fixes the code and variable locations at compile time. Code is compiled starting at a user defined address (defaults to zero) and grows into higher memory. Variables are stored in an area referred to as the heap which starts in high memory and grows downward into low memory. The heap starting point can also be defined (defaults to 10000 hex). When a variable is defined, the current heap value is decremented (once for characters, twice for words) to provide the variable address. (Variable allocation also assures that all 16 bit variables are assigned even addresses.) Typically the program starting address is assigned to the lowest available memory address and the initial heap value is set to one plus the highest memory address. fc does not support paged memory so all variables and code are limited to a single 64K byte page. Reserved Words and Characters The words listed below are recognized by the compiler as being special and can not be used as word or variables names. These words also correspond to the operations supported by the compiler. ; : ) code heap constant variable cvariable array carray xvariable xword nop ucode again begin drop dup else if dup?_if next over repeat swap then until while nop exit not of( 0< 2* 2*c cU2/ c2/ U2/ 2/ N2* N2*c D2* D2*c cUD2/ cD2/ UD2* D2/ + - and or xor nor nand +c -c xnor g@ g! u@ u! c@ c! @ ! @+ @- c@+ c@- !+ c!+ !- c!- ['] , { } byte word xcode xheap xlink The '#' character is reserved for use by preprocessor commands, conditional compilation, and as a character inside a string definition. The '"' character is reserved for strings and should not be used in word and variable definitions. Numbers and Strings fc recognizes base 10 and base 16 (hex) numbers. Numbers preceeded by 0x are assumed to be hex and all other numbers are assumed to be base ten. fc also recognizes strings. Strings may be included in word definitions and must be enclosed by double quotes (" "). Within the source file, strings may not extend over more then one line of the source file text. Special characters may be imbedded into strings by using the following codes: \n Carriage Return, Linefeed \t Tab character \b Backspace character \r Carriage Return \f Form Feed \\ Backslash \" Double quote Word, Variable and Constant names fc supports user defined names as long as 32 characters. In general, all names should start with a non numeric character and be kept to 20 characters or less. As a special case, names can start with number provided that the second character is not a number or an 'x' or 'X'. fc Syntax An fc program consists of memory definitions, variable definitions, constant definitions, data definitions, x list control, and word definitions. Memory definitions tell the compiler where to place code and variables. They have the form: number CODE ( sets next code output to address = number ) number HEAP ( sets heap = number ) The program may have any number of code and heap definitions, but typically there will be only one heap definition and code definitions will be used at the start of the program and when words have to be compiled to specific addresses (like interrupt routines). fc is very stupid and will allow you to overwrite code and variables without the slightest warning so, be careful. Note that for each code definition, fc will start a new output section. Variable definitions tell the compiler to allocate memory for variables and arrays. They have the form: VARIABLE name ( declares name as a variable and allocates 1 word ) CVARIABLE name ( same as above but, allocates 1 byte ) number XVARIABLE name (assumes a variable name is at address = number) number ARRAY name ( declares and allocates number words for name ) number CARRAY name ( declares and allocates number bytes for name ) The only difference between these definitions is the amount of memory allocated. Once a variable is declared and allocated, fc knows no difference between a cvariable, variable, array, carray, or xvariable. The programmer is responsible for insuring that each variable has the proper allocation for any operations performed in the program. Constant definitions tell the compiler about different kinds of constants that can be used. They have the form: number CONSTANT name ( declare name as a constant = number ) number XWORD name ( declare name as a word at address = number ) number UCODE name { declare name as a machine instruction = number ) Constant definitions allow the user to use symbolic names in place of actual numbers. Whenever a constant is encountered in a FORTH word definition, fc will generate code to push its declared value onto the parameter stack. Whenever a xword is encountered, fc will generate code for a subroutine call to the declared address. When a ucode is encountered, the declared value is substituted directly as a machine instruction. Data definitions tell the compiler to place tables of words or bytes into memory for use by programs. They have the form: WORD name { datalist } ( place datalist into memory as words ) BYTE name { datalist } ( place datalist into memory as bytes ) where datalist is a list of constants separated by one or more spaces and enclosed in braces. In the case of a WORD table, the datalist may also include variable names or word names. (Variables must be declared before they are included in the list) In these cases, the address of the variable or word is placed in the list. For both word and byte tables, the table name may be used by a program to push the address of the table onto the stack (just like a variable name). X list control words control generation of the link file that is generated when fc is invoked with the -x option. The link file is refered to as a .x file (its filename is sourcefilename".x"). The x list control words appear below: XCODE ( causes a CODE statement to be printed in the .x file ) XHEAP ( causes a HEAP statement to be printed in the .x file ) XLINK ( causes the next defined symbol* to be printed in the .x file) *symbol should be a word, variable, or data definition When XCODE appears in the source file, the .x file will contain a CODE statement that defines code start at the next available location when compilation finished. When XHEAP appears in the source file, the .x file will contain a HEAP statement that defines the heap as the next available memory location when compilation finished. When XLINK appears before a word, variable, or data definition, it causes the address of the following symbol to be included into the .x file as an XWORD or XVARIABLE definition. The x list control words should be used outside word and data definition. The XCODE and XHEAP words need only appear once in a file to generate the CODE and HEAP statements in the .x file. The XLINK word will typically appear immediately before the definition of the word, variable, or data definition to be installed in the .x file. Word definitions have the same format as standard FORTH words. They start with a colon followed by the word name, a body of statements, and end with a semicolon. Word definitions do not have to be ordered in a source file since words may be called before they are defined. fc recognizes only a small number of basic FORTH words (called statements here) summarized below: if ..statements.. then if ..statements.. else ..statements.. then ?dup_if ..statements.. then ?dup_if ..statements.. else ..statements.. then for ..statements.. next begin ..statements.. again begin ..statements.. until begin ..statements.. while ..statements.. repeat exit ( return to calling word ) drop swap dup over nop ( no operation ) not (ones complement ) + - +c -c xnor nand xor or ( alu operations ) g@ ( fetch from RTX2000 ASIC bus ) g! ( store to RTX2000 ASIC bus ) u@ ( fetch from RTX2000 user memory space ) u! ( store to RTX2000 user memory space ) c@ c! @ ! ( character and word fetch and store ) @+ c@+ @- c@- ( character/word fetch with auto increment/dec ) !+ c!+ !- c!- ( character/word store with auto inc/decrement ) 0< 2* 2*c ( shift instructions ) cU2/ c2/ U2/ 2/ N2* N2*c D2* D2*c cUD2/ cD2/ UD2* D2/ of( ( indicates RTX2000 streamed instruction ) ['] ( pushes address of following word onto stack ) "text" ( defines string, pushes string address onto stack ) , ( causes immediate code production ) Many statements are compatible with the simple words in the FORTH-83 standard. The compiler also supports RTX2000 specific instructions, such as streamed instructions using OF( (a closing right parenthesis must be included to define the end of the streamed instruction) and u!, u@, g!, and g@ memory and ASIC bus access statements. fc also supports all the RTX2000 shift instructions. (See RTX2000 data sheet or Programmer's Reference Manual from Harris) Also supported is the comma operator which causes the compiler to start a new RTX machine instruction with the statement that follows. This feature is sometimes useful for writing optimized programs. Many simple FORTH words are easily generated using macros. See accompanying programs for examples. Most RTX2000 compilers and interpreters directly support a number of RTX2000 specific words for manipulating registers on the processor's internal ASIC bus. These words are not directly supported by fc, however, the macro feature allows programmers to define macros for these instructions. Typically a file filled with these definitions can be referenced using an include command in the program source file. Likewise, support for RTX2010 features can be added using macro commands. The basic statements (words) supported by the compiler are described below. Stack diagrams are also provided to show the state of the parameter stack before and after execution of the statement. if ..statements.. then The top value of the parameter stack is popped and evaluated. If the value is nonzero ..statements.. are executed. if ..statementsA.. else ..statementsB.. then The top value of the parameter stack is popped and evaluated. If the value if nonzero, ..statementsA.. are executed, otherwise ..statementsB.. are executed. ?dup_if ..statements.. then If the top value of the parameter stack is nonzero, then ..statements.. are executed. Otherwise, the stack is popped and discarded. ?dup_if ..statementsA.. else ..statementsB.. then If the top value of the parameter stack is nonzero, then ..statementsA.. are executed. Otherwise, the stack is popped, the zero is discarded, and ..statementsB.. are executed. begin ..statements.. again Executes ..statements.. again and again in an endless loop. begin ..statements.. until ..statements.. are executed. The top of the parameter stack is popped by "until" and evaluated. If the value is nonzero, execution continues with the statement following "until", otherwise ..statements.. are executed again. begin ..statementsA.. while ..statementsB.. repeat ..statementsA.. are executed, then "while" pops the top of the parameter stack. If the value is zero, statements after repeat are executed, otherwise ..statementsB.. are executed then..statementsA.. are executed again and the "while" test is performed again. exit ( -- ) Execution is returned to the calling word. drop ( x -- ) The top of the parameter stack is popped and discarded. swap ( x y -- y x ) The top two items on the parameter stack are exchanged. dup ( x -- x x ) The top value of the parameter stack is copied and pushed onto the parameter stack. over ( x y -- x y x ) The second value of the parameter stack is copied and pushed onto the parameter stack. nop ( -- ) No operation is performed. not ( x -- y ) The top of the parameter stack is replaced by its ones complement, y = not(x). + ( a b -- c ) The top two items on the parameter stack are replaced by their sum, c = a + b. - ( a b -- c ) The top two items on the parameter stack are replaced by their difference, c = a - b. +c ( a b -- c ) The top two items on the parameter stack are replaced by their sum plus the value of the carry bit, c = a + b + carry bit. -c ( a b -- c ) The top two items on the parameter stack are replaced by their difference minus the ones complement of the carry bit, c = a - b - not(carry). xnor ( a b -- c ) The top two items on the parameter stack are replaced by the result of a logical exclusive nor operation, c = a xnor b. nand ( a b -- c ) The top two items on the parameter stack are replaced by the result of a logical nand operation, c = a nand b. xor ( a b -- c ) The top two items on the parameter stack are replaced by the result of a logical exclusive or operation, c = a xor b. or ( a b -- c ) The top two items on the parameter stack are replaced by the result of a logical or operation, c = a or b. g g@ ( -- x ) Fetches word x from address g on the processor's ASIC bus and pushes onto parameter stack. g is a constant between zero and thirty- one. See RTX2000/RTX2010 data sheet for register assignments. g g! ( x -- ) Pops top of parameter stack and stores it into address g of the processor's ASIC bus. g is a constant between zero and thirty-one. See RTX2000/RTX2010 data sheet for register assignments. u u@ ( -- x ) Fetches word x from user space address u and pushes onto the parameter stack. u is a constant between zero and thirty-one. u u! ( x -- ) Pops top of parameter stack and stores it into address u of the user space. u is a constant between zero and thirty-one. @ ( a -- d ) Replaces the top of the parameter stack with a word fetched from the address a. c@ ( a -- d ) Replaces the top of the parameter stack with the byte fetched from memory address a. ! ( d a -- ) Pops two items from the top of the parameter stack and stores word d into memory address a. c! ( d a -- ) Pops two items from the top of the parameter stack and stores data d as a byte into memory address a. @+ ( a -- d a+2 ) Pop address a from top of parameter stack and fetch word d from memory address a. Push d and a+2 onto parameter stack. c@+ ( a -- d a+1 ) Pop address a from top of parameter stack and fetch byte d from memory address a. Push d and a+1 onto the parameter stack. @- (a -- d a-2 ) Pop address a from the top of the parameter stack and fetch word d from memory address a. Push d and a-2 onto the parameter stack. c@- ( a -- d a-1 ) Pop address a from the top of the parameter stack and fetch byte d from memory address a. Push d and a-1 onto the parameter stack. !+ ( d a -- a+2 ) Pop top two items off the parameter stack. Store word d into memory address a then push value a+2 onto parameter stack. c!+ ( d a -- a+1 ) Pop top two items off the parameter stack. Store d as a byte into memory address a then push value a+1 onto the parameter stack. !- ( d a -- a-2 ) Pop top two items off the parameter stack. Store word d into memory address a then push value a-2 onto the parameter stack. c!- ( d a -- a-1 ) Pop top two items off the parameter stack. Store d into memory address a as a byte, then push value a-1 onto the parameter stack. 0< ( a -- b ) The top of the parameter stack is replaced by a value obtained by extending the most significant bit to every bit in the word. 2* ( a -- b ) The top of the parameter stack is shifted left by one bit. Zero is shifted into the lsb and the msb is shifted into the carry bit. 2*c ( a -- b ) The top of the parameter stack is shifted left by one bit. The carry bit is shifted into the lsb and the msb is shifted into the carry bit. cU2/ ( a -- b ) The top of the parameter stack is shifted right by one bit. The carry bit is shifted into the msb and the lsb is discarded. Carry bit is set to zero. c2/ ( a -- b ) The top of the parameter stack is shifted right by one bit. The carry bit is shifted into the msb and the lsb is shifted into the carry bit. U2/ ( a -- b ) The top of the parameter stack is shifted right by one bit. Zero is shifted into the msb and the lsb is discarded. The carry bit is set to zero. 2/ ( a -- b ) The top of the parameter stack is shifted right by one bit. The msb remains unchanged and the lsb is discarded. The carry bit is set to the value of the msb. N2* ( a x -- b x ) The second value from the top of the parameter stack is shifted left by one bit. Zero is shifted into the lsb. The carry bit is not changed. N2*c ( a x -- b x ) The second value from the top of the parameter stack is shifted left by one bit. The carry bit is shifted into the lsb. The carry bit is not changed. D2* ( a b -- c d ) The top two items on the parameter stack are shifted left one bit together as a thirty-two bit word. The msb of the top is shifted into the carry bit. Zero is shifted into the lsb of the second item. D2*c ( a b -- c d ) The top two items on the parameter stack are shifted left one bit together as a thirty-two bit word. The carry bit is shifted into the lsb of the second item and the msb of the top is shifted into the carry bit. cUD2/ ( a b -- c d ) The top two items on the parameter stack are shifted right one bit together as a thirty-two bit word. The carry bit is shifted into the msb of the top and the lsb of the second item is discarded. The carry bit is set to zero. cD2/ ( a b -- c d ) The top two items on the parameter stack are shifted right one bit together as a thirty-two bit word. The carry bit is shifted into msb of the top and the lsb of the second item is shifted into the carry. UD2* ( a b -- c d ) The top two items on the parameter stack are shifted right one bit together as a thirty-two bit word. Zero is shifted into the msb of the top and the lsb of the second item is discarded. The carry bit is set to zero. D2/ ( a b -- c d ) The top two items on the parameter stack are shifted right one bit together as a thirty-two bit word. The msb of the top remains unchanged and the lsb of the second item is discarded. The carry bit is set to the state of the msb of the top item. for ...statements... next "for" causes a loop count (n) to be popped off the parameter stack and ...statements... are executed n times. The iteration count minus one can be obtained by reading the top value of the return stack (using r@). ( Note: return instructions should not be executed inside for...next loops since the iteration count is stored on the return stack) of( ( n -- ) Indicates an instruction, or instructions should be repeated. A closing right parenthesis is required to indicate the range of instructions to be repeated. The instruction, or instructions must compile into a single 16 bit RTX2000 machine instruction or an error will occur. The repetition count, n, is popped off the parameter stack and the indicated instruction(s) are executed n + 1 times. ['] user_defined_word When ['] proceeds a user defined word, the address of the word is pushed onto the parameter stack instead of word being called as a subroutine. "String Text" ( -- a ) Pushes the address of the defined string onto the parameter stack. The first byte of the string will be a count of the number of characters that follow. (Assumes that the RTX2000 is configured for Motorola type byte addressing) , Comma causes the compiler to start a new RTX2000 machine instruction with the statement that follows. Commas allow programmers to explicitly show how a program or portion of a program should be partitioned into RTX2000 instructions. This allows the user to directly control which statements are combined into RTX2000 instructions. Commas are used to force a desired optimization that the compiler may not be able to achieve without the help of the programmer. This feature is useful when writing super optimized code when the programmer has a good understanding of the RTX2000 instruction set. In addition to the statements (words) defined above, the program may refer to user defined variables and words. All variables must be defined before they are used, fc assumes all undefined references to be subroutine calls (user defined words). Words do not have to be defined before they are used but must be defined somewhere in the program being compiled. Comments may be placed anywhere in the program file and begin with a space or newline followed by a left parenthesis. They may extend over several lines and matching pairs of parenthesis may be included inside comments. A right parenthesis matching the left parenthesis that started the comment is used to end a comment. fc Code Optimization fc performs optimization at the source code level by collecting the greatest number of FORTH statements that fit into one of its syntax templates (coded using yacc). Many of these templates correspond directly to single RTX2000 instructions while others provide a default path in case single instruction optimization can't be achieved. As the compiler collects FORTH statements one at a time to fit to a given template, other templates must be available so that code can be produced at any point (such as if a call statement is encountered, which can not be combined with other statements). The yacc generated parser insures that the longest template will be chosen, incorporating as many FORTH instruction as feasible (given a particular template list) into a single RTX2000 instruction. This approach does not necessarily produce the shortest RTX2000 program but, it is easy to code and allows easy incorporation of new optimizations. The comma operator can be used to force the compiler to abandon looking for additional statements to pack into the machine instruction. The statements following the comma will always start a new RTX2000 machine instruction. fc incorporates limited single token look ahead in its lexical analyzer. Whenever a short constant or a "swap" is encountered, it looks ahead to see what the next token is. If a "u@", "u!", "g@", or "g!" follows a short constant, then the compiler must start a new instruction beginning with the short constant. When a "swap" is encountered that proceeds an alu operation, it can be incorporated into the same instruction as the alu operation. Also if a "swap" follows a "swap", both are ignored. This processing allows the compiler to aggressively optimize out any number of swaps that might precede an alu operation. Default Object Code Format Whenever fc is invoked without the e, r, or o option, and no errors have occurred during compilation, fc will produce an object file in the default format. This format consists of a sequence of one or more sections of code. Each section consists of a start address, a byte count, and RTX2000 instructions in binary format. The compiler will produce a new section each time the code address is set using a "code" definition, and whenever a current section becomes longer then 1024 bytes. The format of the default object code file appears below. Each data entity is stored as a sixteen bit word. Number of Sections Starting address of section #1 Number of bytes in section #1 Data for section #1 { Starting address of section #2 (if needed) } { Number of bytes in section #2 (if needed) } { Data for section #2 (if needed) } .... DMSP Load Block Code Format Whenever fc is invoked with the o option and no compilation errors occur, code is produced in the DMSP Load Block Format. This format consists of a series of 16 bit numbers represented in ascii/hex format. Each code section produced by the compiler is packed into an independent load block and all load blocks are concatenated into a single file. The output file format appears below. Configuration Number Start Address for section #1 Data Word Count for section #1 Control Checksum for section #1 header Data for section #1 Block checksum for section #1 { Configuration Number for section #2 (if needed)} { Start Address for section #2 (if needed) } { Data Word Count for section #2 (if needed) } { Control Checksum for section #2 (if needed) } { Data for section #2 (if needed) } { Block checksum for section #2 (if needed) } .... The configuration number for each load block will equal the compiler version (for version 1.13, the configuration number = 000D (hex)). The checksum algorithm is add and rotate 1 bit right. The header checksum consists of the first three words in a block - the configuration number, the start address, and the word count. The block checksum covers the configuration number, the start address, the word count, the control checksum, and all the data. Error Messages fc produces a limited number of rather general error messages. All error messages, except for undefined word, are accompanied with file name, line number, and text from the vicinity of the error. After an error has occurred, the compiler continues until it completes the file. When an error occurs within a word definition, the compiler may skip over nearby errors and not report them. All errors will be included in the list file if the l option is specified. A missing colon in a word definition may cause the compiler to print out a lot of "invalid statement" errors, one for each label encountered in the word definition. Limitations fc stores generated code internally in a fixed size buffer. Because of the defined length of the buffer, all programs should be limited to 20K - 25K bytes or shorter. Revision History fc version 1.3 is the first version with intentional distribution, however a short history of revision is given below: 1.2 First revision with all EPROM output options 1.2a Bug in l option fixed, l option no longer hangs up the program for outputs that start with indented lines. 1.3 DMSP output option added. 1.4 var keyword changed to variable cvar keyword changed to cvariable xvar keyword changed to xvariable inline keyword changed to ucode return keyword changed to exit To increase compatibility with Harris TFORTH compiler Bug fix in parser to support heap definition of 0x10000 1.5 ['] word added String support added 1.5a IBM only - bug in string support fixed 1.5b IBM only - bug in ['] fixed 1.6 DMSP option changed so that there are no spaces in file Configuration number for this option is set to zero instead of being prompted from the user 1.7 Fixed disassembly output for DUP d g! instructions Added comma feature Added word and byte data definitions 1.8 Corrected bug with instructions of the form DUP d alu-op Changed configuration word to equal the complier version 1.9 Corrected bug with DMSP block load checksum Printout size of each output section Modified preprocessor so that strings are ignored in macro substitutions 1.10 Fixed bug with constants used in "word" statements Fixed preprocessor so that #, (, and ) may be used in strings Added xheap, xcode, and xlink statements Added -s option 1.11 Fixed bug with @- and c@- instructions Filename extensions allowed for source files 1.12 Added optimization for longlit OVER alu-op instruction Added environment variables Added conditional compilation Fixed bug with disassembly of d g@ OVER alu-op Changed command line print-out for code production 1.13 Speeded up preprocessor and symbol table Added optimization for DUP d u@ aluop instruction Allows use for larger programs Added -i -t options 1.14 MS Windows Version (wfc) Trialing '\' character no longer needed for specifing path using envirnment variables No longer prompts user to stop after set number of errors References The second reference is an excellent general reference for scientific computing which also contains some search and sorting algorithms. The other three references provide a good practical introduction to compiler writing using coded examples. Aho, A, V,, Sethi, R., and Ullman, J., D., Compilers Principles, Techniques, and Tools, Addison-Wesley Publishing, Massachusetts, 1988. Flannery, B. P., Press, W. H., Teukolsky, S. A., and Vetterling, W. T., Numerical Recipes in C The Art of Scientific Computing, Cambridge University Press, New York, 1988. Friedman, H. G. Jr., and Schreiner, A. T., Introduction to Compiler Construction with UNIX, Prentice-Hall, New Jersey, 1985. Kernighan, B. W., and Pike, R., The Unix Programming Environment, Prentice-Hall, New Jersey, 1984. Lloyd Linstrom October 1992